Discriminative Similarity for Clustering and Semi-Supervised Learning

نویسندگان

  • Yingzhen Yang
  • Feng Liang
  • Nebojsa Jojic
  • Shuicheng Yan
  • Jiashi Feng
  • Thomas S. Huang
چکیده

Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification

Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class class...

متن کامل

A Discriminative Latent Variable Model for Online Clustering

This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (LM). We present an online clustering algorithm for LM based on a feature-based item similarity function. We provide a learning framework for estimating the similarity function and present a fast stochastic gradient-based learning technique. In...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.01231  شماره 

صفحات  -

تاریخ انتشار 2017